-
-
Notifications
You must be signed in to change notification settings - Fork 234
Give _tkinter $ORIGIN-relative dependencies on glibc and an rpath on musl #745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Giving up on testing this on musl for reasons noted in the comment, but the code to handle musl is there.... |
…musl Partially addresses astral-sh#742 and makes it consistent with what we're doing on macOS. There's an argument in the comments above that we should not set an rpath on libpython (except on musl where it's needed), and I need to see if I still believe that. In the meantime I'm following that pattern and setting $ORIGIN-relative NEEDED on glibc and rpath on musl only. This also adds a specific ldd regression test but not the additional tests listed in astral-sh#742.
This block was just added in astral-sh#676 and isn't actually Tcl/Tk-specific.
|
Any word on this? I've run into it as well after upgrading from 3.12 to 3.13. I suspect it's only working "incidentally" at the moment as most systems already have tcl installed on the system. But not my dockerised test machines... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback, I think it should be fine to just set DT_RUNPATH and leave DT_NEEDED entries alone? I don't see the benefit of treating musl and glibc differently? Just unify on DT_RUNPATH?
Do keep in mind that on musl not only does LD_LIBRARY_PATH have precedence over both DT_RUNPATH and DT_RPATH (legacy/deprecated), when dlopen() is used a parent RPATH from the main executable or if LD_LIBRARY_PATH is set it appears to override any RPATH set on a library. So it's more fragile there compared to loading linked libraries at init.
| for lib in ${ROOT}/out/python/install/lib/*; do | ||
| basename=${lib##*/} | ||
| patchelf_args+=(--replace-needed "$basename" '${ORIGIN}/../../'"$basename") | ||
| done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually necessary? (Fedora 42 container)
$ ldd /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-gnu.so
linux-vdso.so.1 (0x00007ffcc89b4000)
libtcl8.6.so => not found
libtk8.6.so => not found
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007220971fa000)
libc.so.6 => /lib64/libc.so.6 (0x0000722097008000)
/lib64/ld-linux-x86-64.so.2 (0x0000722097213000)
$ patchelf --set-rpath '$ORIGIN/../..' /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-gnu.so
$ ldd /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-gnu.so
linux-vdso.so.1 (0x00007ffd75f99000)
libtcl8.6.so => /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/../../libtcl8.6.so (0x000079bbb2be6000)
libtk8.6.so => /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/../../libtk8.6.so (0x000079bbb2930000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x000079bbb292a000)
libc.so.6 => /lib64/libc.so.6 (0x000079bbb2738000)
libdl.so.2 => /lib64/libdl.so.2 (0x000079bbb2732000)
libm.so.6 => /lib64/libm.so.6 (0x000079bbb2644000)
/lib64/ld-linux-x86-64.so.2 (0x000079bbb2db6000)You can see that just setting the RPath was sufficient (no need to be more strict with path prefixes to deps / DT_NEEDED). Setting DT_RUNPATH provides additional search path(s) that will be used in addition to whatever ldconfig --print-cache (for glibc; musl equivalent differs) + LD_LIBRARY_PATH provide on the system (at runtime resolution).
Both of these runtime configs can also be used to satisfy PyInstaller's discovery of libraries (and PyInstaller's executable it distributes will set LD_LIBRARY_PATH for it's process accordingly at runtime to where it's bundled libs are), although the relative RPath is a better approach (and remains compatible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference (Ubuntu 24.04 container), this is what PyTorch does as well (only adding extra search paths via RPath):
$ ldd .venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so
linux-vdso.so.1 (0x00007fffd2b98000)
libc10_cuda.so => /example/.venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so (0x00007a2e73bc8000)
libcudart.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12 (0x00007a2e73800000)
libcusparse.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12 (0x00007a2e5c200000)
libcufft.so.11 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cufft/lib/libcufft.so.11 (0x00007a2e4ae00000)
libcufile.so.0 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cufile/lib/libcufile.so.0 (0x00007a2e4ab03000)
libcusparseLt.so.0 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparselt/lib/libcusparseLt.so.0 (0x00007a2e2f993000)
libnccl.so.2 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2 (0x00007a2e16e00000)
libcurand.so.10 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/curand/lib/libcurand.so.10 (0x00007a2e0e200000)
libcublas.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12 (0x00007a2e07000000)
libcublasLt.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12 (0x00007a2dd4c00000)
libcudnn.so.9 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.9 (0x00007a2dd4800000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007a2e73bb8000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007a2e73bb1000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007a2e73bac000)
libc10.so => /example/.venv/lib/python3.10/site-packages/torch/lib/libc10.so (0x00007a2e736ed000)
libtorch_cpu.so => /example/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so (0x00007a2dc0348000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007a2dc00ca000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007a2e73ac1000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007a2e736bf000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007a2dbfeb8000)
/lib64/ld-linux-x86-64.so.2 (0x00007a2eaf861000)
libnvJitLink.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/../../nvjitlink/lib/libnvJitLink.so.12 (0x00007a2dba3d0000)
libgomp.so.1 => /example/.venv/lib/python3.10/site-packages/torch/lib/libgomp.so.1 (0x00007a2dba000000)
libcupti.so.12 => /example/.venv/lib/python3.10/site-packages/torch/lib/../../nvidia/cuda_cupti/lib/libcupti.so.12 (0x00007a2db9892000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007a2e73aba000)$ patchelf --print-needed .venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so
libc10_cuda.so
libcudart.so.12
libcusparse.so.12
libcufft.so.11
libcufile.so.0
libcusparseLt.so.0
libnccl.so.2
libcurand.so.10
libcublas.so.12
libcublasLt.so.12
libcudnn.so.9
librt.so.1
libdl.so.2
libpthread.so.0
libc10.so
libtorch_cpu.so
libstdc++.so.6
libm.so.6
libgcc_s.so.1
libc.so.6
ld-linux-x86-64.so.2$ patchelf --print-rpath .venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so
$ORIGIN/../../nvidia/cublas/lib:$ORIGIN/../../nvidia/cuda_cupti/lib:$ORIGIN/../../nvidia/cuda_nvrtc/lib:$ORIGIN/../../nvidia/cuda_runtime/lib:$ORIGIN/../../nvidia/cudnn/lib:$ORIGIN/../../nvidia/cufft/lib:$ORIGIN/../../nvidia/curand/lib:$ORIGIN/../../nvidia/cusolver/lib:$ORIGIN/../../nvidia/cusparse/lib:$ORIGIN/../../nvidia/cusparselt/lib:$ORIGIN/../../cusparselt/lib:$ORIGIN/../../nvidia/nccl/lib:$ORIGIN/../../nvidia/nvtx/lib:$ORIGIN/../../nvidia/cufile/lib:$ORIGIN| // - Most things are "libxyz.so.1 => /usr/lib/libxyz.so.1 (0xabcde000)". | ||
| // - The ELF interpreter is displayed as just "/lib/ld.so (0xabcde000)". | ||
| // - glibc, but not musl, shows the vDSO as "linux-vdso.so.1 (0xfffff000)". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true regarding glibc vs musl for vDSO, just look at the output I showed from Ubuntu and Fedora containers with ldd, that's glibc and still includes linux-vdso.so.1 (these are transitive from glibc).
Realistically, you should only be interested in direct dependencies? As per patchelf --print-needed output (which doesn't add tabbed indents). That will show what's actually encoded as DT_NEEDED entries.
Anything transitive is resolved at runtime, thus ldd can be useful for troubleshooting that dependency chain at a runtime environment and filtering for not found if anything failed to resolve. Likewise, you could validate the libraries were resolved to the paths you expect.
| // - On glibc, if a library cannot be found ldd returns zero and shows "=> not | ||
| // found" as the resolution (even if it wouldn't use the => form if found). | ||
| // - On musl, if a library cannot be found, ldd returns nonzero and shows "Error | ||
| // loading shared library ...:" on stderr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In glibc you'll often find ldd is a bash script (varies by distro) around the actual runtime dynamic linker, which you could call with --list to resolve libraries:
# `/usr/bin/ldd` is a bash script wrapper around this:
$ /lib64/ld-linux-x86-64.so.2 --list /lib64/libc.so.6
/lib64/ld-linux-x86-64.so.2 (0x00007f8ead016000)
linux-vdso.so.1 (0x00007fff355f6000)
# Failure example:
$ /lib64/ld-linux-x86-64.so.2 --list /root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-gnu.so
/root/.local/share/uv/python/cpython-3.13.6-linux-x86_64-gnu/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-gnu.so: error while loading shared libraries: libtcl8.6.so: cannot open shared object file: No such file or directoryAs you can see the error message differs from the not found that the ldd script command outputs.
Alpine has a very simple one for musl:
$ docker run --rm -it alpine
$ cat /usr/bin/ldd
#!/bin/sh
exec /lib/ld-musl-x86_64.so.1 --list "$@"For testing that validates direct deps of a library/executable are found, you should be able to use patchelf --print-needed to get a list of deps, then check those against patchelf --print-rpath paths, followed by the systems standard search paths (a match from ldconfig --print-cache output should work? Just don't use that from Alpine). That should be mostly glibc/musl agnostic then? (With exceptions like libc.so => /lib/ld-musl-x86_64.so.1)
| // TODO(geofft): musl doesn't do lazy binding for the argument to | ||
| // ldd, so we will get complaints about missing Py_* symbols. Need | ||
| // to handle this somehow, skip testing for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just filter out those lines?:
$ /lib/ld-musl-x86_64.so.1 --list /root/.local/share/uv/python/cpython-3.13.9-linux-x86_64-musl/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-musl.so 2>&1 | grep -v relocating
/lib/ld-musl-x86_64.so.1 (0x7453c8bed000)
Error loading shared library libtcl8.6.so: No such file or directory (needed by /root/.local/share/uv/python/cpython-3.13.9-linux-x86_64-musl/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-musl.so)
Error loading shared library libtk8.6.so: No such file or directory (needed by /root/.local/share/uv/python/cpython-3.13.9-linux-x86_64-musl/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-musl.so)
libc.so => /lib/ld-musl-x86_64.so.1 (0x7453c8bed000)$ patchelf --print-needed /root/.local/share/uv/python/cpython-3.13.9-linux-x86_64-musl/lib/python3.13/lib-dynload/_tkinter.cpython-313-x86_64-linux-musl.so
libtcl8.6.so
libtk8.6.so
libc.so| if !output.status.success() { | ||
| // TODO: If we ever have any optional dependencies besides libcrypt (which is | ||
| // glibc-only), we will need to capture musl ldd's stderr and parse it. | ||
| errors.push(format!( | ||
| "`{ldd} {shared_lib}` exited with {}:\n{stdout}", | ||
| output.status | ||
| )); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps instead of relying on ldd CLI you could use an equivalent crate like elb-dl? It has support for resolving ELF dependencies that use glibc and musl dynamic loaders: https://crates.io/crates/elb-dl
Might be more consistent via a crate library to implement this logic for, without the caveats of handling different program outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a related crate for patchelf functionality too: https://docs.rs/elb
Partially addresses #742 and makes it consistent with what we're doing on macOS.
There's an argument in the comments above that we should not set an rpath on libpython (except on musl where it's needed), and I need to see if I still believe that. In the meantime I'm following that pattern and setting $ORIGIN-relative NEEDED on glibc and rpath on musl only.
This also adds a specific ldd regression test but not the additional tests listed in #742.